6 research outputs found
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Reinforcement learning from human feedback (RLHF) is a recent technique to
improve the quality of the text generated by a language model, making it closer
to what humans would generate. A core ingredient in RLHF's success in aligning
and improving large language models (LLMs) is its reward model, trained using
human feedback on model outputs. In machine translation (MT), where metrics
trained from human annotations can readily be used as reward models, recent
methods using minimum Bayes risk decoding and reranking have succeeded in
improving the final quality of translation. In this study, we comprehensively
explore and compare techniques for integrating quality metrics as reward models
into the MT pipeline. This includes using the reward model for data filtering,
during the training phase through RL, and at inference time by employing
reranking techniques, and we assess the effects of combining these in a unified
approach. Our experimental results, conducted across multiple translation
tasks, underscore the crucial role of effective data filtering, based on
estimated quality, in harnessing the full potential of RL in enhancing MT
quality. Furthermore, our findings demonstrate the effectiveness of combining
RL training with reranking techniques, showcasing substantial improvements in
translation quality.Comment: 14 pages, work-in-progres
Sparse Continuous Distributions and Fenchel-Young Losses
Exponential families are widely used in machine learning, including many
distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet,
Poisson, and categorical distributions via the softmax transformation).
Distributions in each of these families have fixed support. In contrast, for
finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax,
-entmax, and fusedmax), has led to distributions with varying support.
This paper develops sparse alternatives to continuous distributions, based on
several technical contributions: First, we define -regularized
prediction maps and Fenchel-Young losses for arbitrary domains (possibly
countably infinite or continuous). For linearly parametrized families, we show
that minimization of Fenchel-Young losses is equivalent to moment matching of
the statistics, generalizing a fundamental property of exponential families.
When is a Tsallis negentropy with parameter , we obtain
``deformed exponential families,'' which include -entmax and sparsemax
() as particular cases. For quadratic energy functions, the resulting
densities are -Gaussians, an instance of elliptical distributions that
contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov
densities, and for which we derive closed-form expressions for the variance,
Tsallis entropy, and Fenchel-Young loss. When is a total variation or
Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally,
we introduce continuous-domain attention mechanisms, deriving efficient
gradient backpropagation algorithms for . Using
these algorithms, we demonstrate our sparse continuous distributions for
attention-based audio classification and visual question answering, showing
that they allow attending to time intervals and compact regions.Comment: JMLR 2022 camera ready version. arXiv admin note: text overlap with
arXiv:2006.0721
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Many recent advances in natural language generation have been fueled by
training large language models on internet-scale data. However, this paradigm
can lead to models that generate toxic, inaccurate, and unhelpful content, and
automatic evaluation metrics often fail to identify these behaviors. As models
become more capable, human feedback is an invaluable signal for evaluating and
improving models. This survey aims to provide an overview of the recent
research that has leveraged human feedback to improve natural language
generation. First, we introduce an encompassing formalization of feedback, and
identify and organize existing research into a taxonomy following this
formalization. Next, we discuss how feedback can be described by its format and
objective, and cover the two approaches proposed to use feedback (either for
training or decoding): directly using the feedback or training feedback models.
We also discuss existing datasets for human-feedback data collection, and
concerns surrounding feedback collection. Finally, we provide an overview of
the nascent field of AI feedback, which exploits large language models to make
judgments based on a set of principles and minimize the need for human
intervention.Comment: Work in Progres
O lugar do político no parlamento português : o caso da procriação medicamente assistida
Dissertação de mestrado em Sociologia. As Sociedades Nacionais Perante os Processos da Globalização pela Faculdade de Economia da Universidade de Coimbra, 2009